Skip to content

ltx-audio: use radix-2 fft to speed up compute_log_mel_spectrogram#1514

Open
stduhpf wants to merge 1 commit into
leejet:masterfrom
stduhpf:fft-mel-spec
Open

ltx-audio: use radix-2 fft to speed up compute_log_mel_spectrogram#1514
stduhpf wants to merge 1 commit into
leejet:masterfrom
stduhpf:fft-mel-spec

Conversation

@stduhpf
Copy link
Copy Markdown
Contributor

@stduhpf stduhpf commented May 17, 2026

Summary

This PR optimises one part of the LTX2.3 audio VAE by implementing Cooley-Tukey Radix-2 FFT algorithm to replace inlined DFT. Output sound quality is indistinguishable in my experience (as it should be).

(I also changed the intermediate mel_value to be float instead of double, this technically increases the footprint of this PR a bit, I don't think it affect performance or quality much either way, I kinda included it in this PR by mistake, but the code is a bit cleaner like that, so I'm leaving it in)

Related Issue / Discussion

Additional Information

Performace comparison, generating a 32x32 24fps video

  • 121 frames (5s):
compute_log_mel_spectrogram() time speedup Total audio decode time Total speedup
DFT (master) 21.315s 1x 209.086s 1x
FFT (PR) 1.284s 16.6x 188.718s 1.1x
  • 9 frames (0.33s):
compute_log_mel_spectrogram() time speedup Total audio decode time Total speedup
DFT (master) 1.566s 1x 4.577s 1x
FFT (PR) 119ms 13.2x 2.873s 1.6x

Checklist

@stduhpf
Copy link
Copy Markdown
Contributor Author

stduhpf commented May 17, 2026

Even with this change, the audio decoding remains very slow, especially for longer sequences. Maybe it would be worth investigating for using temporal tiling there too?

@leejet
Copy link
Copy Markdown
Owner

leejet commented May 18, 2026

I plan to add the missing operator to ggml, which should make it much faster. If this work takes too long or there are issues with the implementation, I’ll consider merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants